[ES-1804970] Fix CloudFetch returning stale column names from cached results#346
Open
sreekanth-db wants to merge 2 commits intodatabricks:mainfrom
Open
Conversation
…results When the server result cache serves Arrow IPC files from a prior query, the embedded schema contains stale column aliases. The Go driver's CloudFetch path read these stale names directly, while the local path already used the authoritative schema from GetResultSetMetadata. Pass the authoritative schema bytes into NewCloudBatchIterator and replace stale column names on deserialized records using array.NewRecord, which is zero-copy (shares underlying column data). Co-authored-by: Isaac Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
ce777da to
65a8750
Compare
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bug where
arrow.Record.Schema()returns stale column aliases when CloudFetch serves cached Arrow IPC files from a structurally identical prior query with differentASaliases.NewCloudBatchIteratorwas not receiving the authoritative schema bytes fromGetResultSetMetadata, unlike the local batch path which already had this. CloudFetch Arrow IPC files have column names baked in from the original query, and the driver was reading them as-is.arrowSchemaBytes(the authoritative schema fromGetResultSetMetadata) intoNewCloudBatchIterator. After records are deserialized from the IPC stream, replace the stale schema with the authoritative one usingarray.NewRecord()(zero-copy — shares underlying column data, only swaps metadata).Changes
arrowRecordIterator.go— Passri.arrowSchemaBytestoNewCloudBatchIteratorinnewBatchIterator()arrowRows.go— PassschemaBytestoNewCloudBatchIteratorinNewArrowRowScanner()batchloader.go— Core fix:NewCloudBatchIteratoracceptsarrowSchemaBytes, parses into*arrow.Schema, stores onbatchIteratorbatchIterator.Next()applies override schema to CloudFetch records only (local path is untouched,overrideSchemaisnil)schemaFromIPCBytes()helperWarnlevelbatchloader_test.go— AddedTestCloudFetchSchemaOverridewith two subtests:["id","name"]are overridden to["x","y"]nilschema bytes pass through original names unchangedWho is affected
Go driver users with CloudFetch enabled (
WithCloudFetch(true)) who readarrow.Record.Schema()directly. Python, ODBC, and JDBC drivers are not affected.Test plan
internal/rows/arrowbased/)TestCloudFetchSchemaOverridecovers the override and no-override pathssamples.tpch.lineitem(~30M rows) with two queries differing only in column aliases — confirmedarrow.Record.Schema()now returns correct aliasesThis pull request was AI-assisted by Isaac.